## [1] "Mon Mar 25 15:27:38 2019"
library(readr)
library(data.table)
library(plotly, quietly=T)
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Select only Interventional study_type.
## [1] "Interventional studies: 237892 (79.2%)"
| phase | N |
|---|---|
| Early Phase 1 | 2619 |
| Phase 1 | 29795 |
| Phase 1/Phase 2 | 10063 |
| Phase 2 | 41637 |
| Phase 2/Phase 3 | 4963 |
| Phase 3 | 29662 |
| Phase 4 | 25001 |
| NA | 94152 |
Read file of all drugs in AACT. * id is AACT ID. * Note that one study may involve multiple drugs.
## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only studies involving drugs.
## [1] "Drug trials: 124421 ; unique NCT_IDs: 124421"
Merge study metadata with drugs.
| phase | N |
|---|---|
| Early Phase 1 | 2615 |
| Phase 1 | 48593 |
| Phase 1/Phase 2 | 13288 |
| Phase 2 | 68850 |
| Phase 2/Phase 3 | 6503 |
| Phase 3 | 49507 |
| Phase 4 | 36331 |
| NA | 29390 |
## Warning: Ignoring 1 observations
| phase | N |
|---|---|
| Early Phase 1 | 2615 |
| Phase 1 | 48593 |
| Phase 1/Phase 2 | 13288 |
| Phase 2 | 68850 |
| Phase 2/Phase 3 | 6503 |
| Phase 3 | 49507 |
| Phase 4 | 36331 |
| NA | 29390 |
AACT drug names resolved to standard names and structures via SMILES.
## [1] "Drugs with resolved structure: 180555 / 197300 (91.5%)"
| overall_status | N |
|---|---|
| Completed | 114900 |
| Recruiting | 23262 |
| Terminated | 15384 |
| Unknown status | 15111 |
| Active, not recruiting | 10409 |
| NA | 5675 |
| Not yet recruiting | 5604 |
| Withdrawn | 5475 |
| Enrolling by invitation | 741 |
| Suspended | 739 |
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"
## [1] "PubChem CIDs with InChIKeys: 3801"
## [1] "ChEMBL compounds mapped via InChIKeys: 3332"
Select only activities with pChembl values for confidence.
## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"
## [1] "Organisms: 187"
| organism | N_targets |
|---|---|
| Homo sapiens | 1806 |
| Rattus norvegicus | 529 |
| Mus musculus | 238 |
| Bos taurus | 98 |
| Sus scrofa | 36 |
| Cavia porcellus | 26 |
| Escherichia coli K-12 | 19 |
| Oryctolagus cuniculus | 18 |
| Escherichia coli | 17 |
| Mycobacterium tuberculosis | 17 |
## [1] "Human targets: 1806"
| target_type | N |
|---|---|
| SINGLE PROTEIN | 1216 |
| PROTEIN COMPLEX | 247 |
| PROTEIN FAMILY | 210 |
| PROTEIN COMPLEX GROUP | 91 |
| PROTEIN-PROTEIN INTERACTION | 16 |
| SELECTIVITY GROUP | 14 |
| CHIMERIC PROTEIN | 12 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 733" " Tclin: 341" " Tbio: 140"
## [4] " Tdark: 2"